Word-length Entropies and Correlations of Natural Language Written Texts

نویسندگان
چکیده

برای دانلود رایگان متن کامل این مقاله و بیش از 32 میلیون مقاله دیگر ابتدا ثبت نام کنید

اگر عضو سایت هستید لطفا وارد حساب کاربری خود شوید

منابع مشابه

Word-length entropies and correlations of natural language written texts

We study the frequency distributions and correlations of the word lengths of ten European languages. Our findings indicate that a) the word-length distribution of short words quantified by the mean value and the entropy distinguishes the Uralic (Finnish) corpus from the others, b) the tails at long words, manifested in the high-order moments of the distributions, differentiate the Germanic lang...

متن کامل

Entropy analysis of word-length series of natural language texts: Effects of text language and genre

We estimate the n-gram entropies of natural language texts in word-length representation and find that these are sensitive to text language and genre. We attribute this sensitivity to changes in the probability distribution of the lengths of single words and emphasize the crucial role of the uniformity of probabilities of having words with length between five and ten. Furthermore, comparison wi...

متن کامل

Word-Length Correlations and Memory in Large Texts: A Visibility Network Analysis

We study the correlation properties of word lengths in large texts from 30 ebooks in the English language from the Gutenberg Project (www.gutenberg.org) using the natural visibility graph method (NVG). NVG converts a time series into a graph and then analyzes its graph properties. First, the original sequence of words is transformed into a sequence of values containing the length of each word, ...

متن کامل

Distinct word length frequencies: distributions and symbol entropies

The distribution of frequency counts of distinct words by length in a language’s vocabulary will be analyzed using two methods. The first, will look at the empirical distributions of several languages and derive a distribution that reasonably explains the number of distinct words as a function of length. We will be able to derive the frequency count, mean word length, and variance of word lengt...

متن کامل

A framework for conflict analysis of normative texts written in controlled natural language

In this paper we are concerned with the analysis of normative conflicts, or the detection of conflicting obligations, permissions and prohibitions in normative texts written in a Controlled Natural Language (CNL). For this we present AnaCon, a proof-of-concept system where normative texts written in CNL are automatically translated into the formal language CL using the Grammatical Framework (GF...

متن کامل

ذخیره در منابع من


  با ذخیره ی این منبع در منابع من، دسترسی به آن را برای استفاده های بعدی آسان تر کنید

ژورنال

عنوان ژورنال: Journal of Quantitative Linguistics

سال: 2015

ISSN: 0929-6174,1744-5035

DOI: 10.1080/09296174.2014.1001636